Container Format Alernatives
The goals include a provision for a Container file format that is capable of storing various digital files in native formats.
Important criteria for the container format include:
- Cross-platform and cross-codebase. BetterGEDCOM (BG) is a data exchange format, and needs to support a wide audience
- Encapsulates files in virtually any format, including formats yet to be invented
- Support cataloging files in the container so that contents can be referenced easily and references should survive export/import.
- Allow control over encoding (e.g., base64), compression (none or choice of compression algorithms), and encryption
- Support a hierarchical directory-like structure for included files
- Provide indirection so that convenient references to content can be used. Indirect references must survive import/export and persist for archiving
- Need not provide access-control to individual elements (if you can read the container, you can read each element. If you can write the container you can write each element).
- Provides a single entity that can be relied upon to be internally consistent and complete, without stale pointers and broken references (external references can't be guaranteed).
- Should be free of burdensome licensing and royalty conditions, ideally open sourced.
Pre-existing candidates for BetterGEDCOM's container technology:
- For Office documents after Office 2007, Microsoft adopted an XML in a container model and uses a ZIP compatible container. The container is specified separately as OPC ("Open Packaging Conventions").
- An Open Source OASIS standard for OpenDocument (ODF) includes a description of both the XML spec and a container spec. This is similar, and potentially interoperable with OPC, but they differ in some respects.
What other candidates should be considered for the Container format?
Is OpenDocument a solid choice by virtue of its widespread acceptance and various open source implementations that use it (e.g. OpenOffice.org)?
If ZIP needs to be added, add it.
I am asking a question. If there is no answer to that question, I may do as you suggest since I have so far not seen any arguments against zip. But there is no use in putting it on the page if someone comes along and removes it after a few hours. That is why I have asked this question?
Personally, I think the ODF format would be a good starting point. It is widely supported, and is open.
I'm not sure about compression, but multimedia is likely to be compressed already.
.gramps is also a gzipped file format.
Gramps XML uses gzip (.gz, application/x-gzip)